Skip to content

Conversation

@razvanMiu
Copy link

@razvanMiu razvanMiu commented Oct 24, 2025

Description

For PDFs fetched by the Web connector, set the document semantic_identifier to the PDF metadata title when available.

How Has This Been Tested?

  • PDF with title: Confirms semantic_identifier equals metadata title.
  • PDF without title: Confirms fallback to id from URL.
  • Non-PDF content: Confirms no change in behavior.
  • Mixed sitemap/pages: Ensures only PDFs are affected and logging remains clean.

Backporting (check the box to trigger backport action)

Note: You have to check that the action passes, otherwise resolve the conflicts manually and tag the patches.

  • This PR should be backported (make sure to check that the backport attempt succeeds)
  • [Optional] Override Linear Check

@github-actions
Copy link

github-actions bot commented Jan 8, 2026

This PR is stale because it has been open 75 days with no activity. Remove stale label or comment or this will be closed in 15 days.

@github-actions github-actions bot added the Stale label Jan 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants